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Optimized Network Resource Location 

1. Field of the Invention 

This invention relates to replication of resources in computer networks. 

2. Background of the Invention 

The advent of global computer networks, such as the Internet, have led to 

entirely new and different ways to obtain information, A user of the Internet can now 
access information from anywhere in the world, with no regard for the actual location of 
cither the user or the information. A user can obtain information simply by knowing a 
network address for the information and providing that address to an appropriate 
application program such as a network browser. 

The rapid growth in popularity of the Internet has imposed a heavy traffic 
burden on the entire network. Solutions to problems of demand (e.g., better 
accessibility and faster communication links) only increase the strain on the supply. 
Internet Web sites (referred to here as "publishers'*) must handle ever-increasing 
bandwidth needs, accommodate dynamic changes in load, and improve performance for 
distant browsing clients, espieciaily those overseas. The adoption of content-rich 
applications, such as live audio and video, has further exacerbated the problem. 

To address basic bandwidth growth needs, a Web publisher typically subscribes 
to additional bandwidth from an Internet service provider (ISP), whether in the form of 
larger or additional "pipes'- or channels from the ISP to the publisher's premises, or in 
^ the form of large band\yidth conimitments in an ISP's remote hosting server collection. 

These increments are not always as fine-grained as die publij?her needs, and quite often., 
; - lead times can cause the publisher's Web site capacity to lag behind demand. 

To address more serious bandwidth growth problems, publishers may develop 
more complex and cosdy custom solutions. The solution to the most common need, 
increasing capacity, is generally based on replication of hardware resources and site 
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content (known as mirroring), and duplication of bandwidth resources. These solutions, 
however, are difficult and expensive to deploy and operate. As a result, only the largest 
publishers can afford them, since only those; publishers can, amortize the costs over 
many customers* (and Web site bits) V' - . - ; . ' - 

, A number of solutions haye.beenTdeveloped-tp advance replication and 
rnirtoring ' In general^ these technologies are designed rfpr. use by a single Web site and 
do not include features that allow theirveomponem^ be shared by many Web sites 
simuitaheooisly; .u.y- • ^ ri^uoz^.-s. ^--r^- r::'.- ■■>;,• . , 

Some solution mechanisms Qffer replicatioti sofbware that helps keep mirrored 
servers up-to-date. These mechanismsigenerallyj operate^ by makiijg a- complete copy of a 
file system/' One such systemj QperateS i^^^ copies of a file 

system- in synch.i* Another system.prpvides mechanisms for explicidy.and regularly 
copying files that have changed»^^ Database systems are particularly difficult to replicate, 
as they are continually changing. Several mechanisms allow for replication of databases, 
although there are no standard approaches for accomplishing it. Several companies 
offering proxy caches describe them as replication tools. However, proxy caches differ 
because they are operated on behalf of clients rather than publishers. 

Once a Web site is served by multiple sen/ers, a challenge is to ensure that the 
load is appropriately distributed or balaiiced among those ^servers. Domain name-servcr- 
based round-robin address resolution causes different clients to be; directed to different 
mirrors. - , ^ . . : 

Another solution; load; balancing, takes into account the load at each server 
(measured in a variety of ways) to select which server should handle a particular request. 

Load balancers use a variety of techniques to route the request to the appropriate 
server. Most of those load-balancing techniques require that e^rh server be an exact 
replica of the primary Web site. Load balancers do not take into account the "network 
distance" between the client and candidate mirror servers. 
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Assuming that client protocols cannot easily change, there are two major 
problems in the deployment of replicated resources. The first is how to select which 
topy of the resource to ufee. That is, when a request for a resource is made to a single 
server, how should die choice of a replica ofdie server (or of diat data) be made. We 
call this problefh the "rendezvous ptbblem":^ There are a-number of ways to get clients 

' to rendezvous at distaht mte<>r setversi ^These technologies, like; load balancers, must 
route a' request to ah appfdp^Site'Serverv but^^^^ load balancers,- diey.take network 
performance and topology into account in making the determiaatiQiiv 

- * • . Ahiirhb^^df compailids: offer prcKluots which in^ 

pirtbBdzing^d filtei^ ^J.' L v:^i^ /. 

Pioiy'citKds'prdvid^^^^^ ^y^f^p-^tfnt^tJggp^^ network, resource 

co^sum'pri6ri by scoring edj^ies of popular resources* dosb toithe end u&e^s. A client 
aggregator is' an Internet service provider or* odier; organization that brings a large 
numbdr of clients operating browsers' to the Internet. CUent aggregators niay use proxy 
' caches id reduce the bandwidth required to serve web content to dies:e browsers. 
However; traditional proxy caches are operated on behalf of Web clients ratiier dian 

Web publishtits.* - - - ■ ' - ' 

Proxy caches stott the most' popular resources from all publishers, which means 
they must be very latge to 'achieve reasonable cache efficiency. (The efficiency of a 
cache is defined as the n^ber of requestsTfor resources^ which are already cached 
divided by the total number of requests.) 

Proxy 'caches depend on cachexontrolhints delivered with resources to 
detemiine wh^ri die resources 'should be replaced. These hints axe predictive, and are 
necessarily often -incorrect,, so proxy 'caches frequendy serve stale data. In many cases, 
proxy cache operators instruct their proxy to 'ignore hints in order to make the cache 
more efficient, even though this ca^ises it to more frequently^ serve stale data. 

Proxy caches hide the activity of clients from publishers. Once, a resource is 
cached, the publisher has no way of knowing how often it was accessed from the cache. 
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Summary Of The Invention 

This invention provides a Way for servers in' a computer network to off-load 
their processing of requests for selected resources by detennining a different server (a 
"repeater")' to process diose requests; The selectioh of the repeater can be made 
dynamically, based on infofmadon aSoiit possible repeaters. 

If a requested resource cbtitain^Tefeferice^^^ or all of 

these references can be replaced by References 'to' repeaters.^ -^ 

Accordingly^ in one aspect, this mvfehtio'fi^is a rnethb processing resource 
requests in a computer network: Fifst a 'cKent m resource 
from ah origin server, the r^uestinidluding 

resource, the resource ^c^fe^i3^Bef'':^y^^^ origin server. 

Requests arriving at the origin sei^eif do not always' include an indication of the origin 
server, since they ^re sent to the origin server, they do liot neied to name it. A V 
mechanism referred to as a reflector, co-located with the origin server, intercepts the 
request from the client to the origin server and decides whether to reflect the request or 
to handle it locally. If the reflector decides to haiidle the reques: locally, it forwards it to 
the origin server, odierwise it selects a "best" repeater to process the request. If the 
request is reflected, the client is provided with a modified resource identifier designating' 
the repeater. - : 

The dieht gets the modified resource identifier from the reflector and makes a 
request for the particular resource from the repeater designated in the modified resource 
identifier. . ■ 

When the repeater gets the client's request, it responds by returning the 
requested resource to the client. If the repeater has a local copy of the resource then it 
returns that copy, otherwise it forwards the request to the origin server to get the 
resource, and saves a local copy of the resource in order to serve subsequent requests. 

The selection by the reflector of an appropriate repeater to handle the request 
can be done in a number of ways. In the preferred embodiment, it is done by first pre- 
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partitioning the network into "cost groups"- and then determining which cost group the 
client is. in.. Next, from a plurality of repeaters in the network, a set of repeaters is 
selected, the members of the set haying.a low cost relative to the cost group which the 
client is in. In order tO:,determirie the lowest cost, a table is maintained and regularly 
updated to defme the cqst t,e,t;vyeen ,each group and each repeater, ^Then one member of 
the set is-selected, pjeferabJY,rand(5mLy,.as the .bjest repeater. ^ . 

If the particula5.r^quqsjedrc5Q cpicontain identifiers of other 

resomees,- thenthe^.tjesourcie^rpay bP:rewriRteri.^(before>^ing pro^ to the client). In 
, .■ paxticular^'the res<g«upe is. rewritten tq, replace atje^st some o( the resource identifiers 

.r, contaln«di therfiia.iwth momim^N^f?^^ .'^P^'" °^ 

: origiri-.server....,-4s.a cp^sc.^iv^X^:^^'^ 
resources based on identifiers in the p^^:ular requested resource, the cUent will make 
those requests directly to the selected repeater, bypassing die reflector and origin server 
entirely. 

Resource rewriting must be performed by.rcfleqtors. It may also be performed 
by repeaters, in the situation where repeaters "peer".widi one anotfier and make copies 
of resources which include rewritten resource identifiers that designate a repeater. 
: , . In a preferred ernbodiment, the network is die Internet and the resource 
identifier is a uniform resource locator (URL) for designating resources on the Internet, 
and the modified resource ideiririfier is a URL designating the repeater and indicating die 
:. on&n server (as described, in ?tep B3 below), and the modified resource identifier is 
provided to the client using a REDIRECT message. Note, only when the reflector is 
, "reflecting'.' a request is the.modified resource identifier provided using a REDIRECT 
• message. . . -. 

...... In another aspect, this invention is a computer network comprising a pluraUty of 

origin servers, at least some of the origin. servers having reflectors associated therewidi, 
and a plurality, of repeaters. . 
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Brief Descwption OF THE Pi^wiNGS 

The above and other objects and advantages of the invention will be apparent 
upon consideration of the tollowihg detailed description, taken in conjunction with the 
accompanying drawings, in which the reference characters refer tOilike parts throughout 
and in which: ' ' ' ' ' ' ' - '^'r'.■^:' - 

Figure 1 depicts a' portion of a network ^nvirbrtmHAt according to the present 
invention; and "" '^''^ " ■. ■ ,j , 

Figures 2-6 are flow charts of Ulc bpc^^^^ present invention. 



10 .Petmle^t^Descm^ 

Presently PREFEkREi) E^biMPiJVRY Embodiments 



Overview . , ^ 

Figure 1 shows a portion of a network environment 100 according to the 
present invention, wherein a mechanism (reflector 108, described in detail below) at a 

15 server (herein origin server 102) maintains and keeps track of a number of partially 

replicated servers or repeaters 104a, 104b, and 104c. Each repieater 104a, 104b, and 104c 
replicates some or all of the information available oh the origin server 102 as well as 
information available on other origin servers in the network 100. Reflector 108 is 
connected to a particular repeater known as its "cori tact" repeater ("Repeater B" 104b in 

20 the system depicted in FIGURE 1): Preferably each reflector rnadn tains a connection with 

a single repeater known as its contact, and each repeater rhaintains a cohnection with a 
special repeater known as its master repeater (e.g., repeater 104rri for repeaters 104a, 
104b and 104c in Figure 1). ' , 

Thus, a repeater can be considered asi a dedicated proxy server that maintains a 

25 partial or sparse mirror of the origin server 102, by implementing a distributed coherent 

cache of the origin server. A repeater may maintain a (partial) mirror of more than one 
origin server. In some embodiments, the network 100 is the Internet and repeaters 
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mirror selected resdurces provided^by origin servers in response to clients' HTTP 
(hypertext transfer protocol) and FTP (file transfer protocol) requests. 

A client 106 connects, via the nctwpxk 100, to origin server 102 and possibly to 
one or more repeaters 104a etc. .;, 

Origin server 102 is a server at which resources originate. . More generally, die 
■ origin server 102 is any process .or collection of processes diat prqyide resources in 
response to requests from a client 106. Origin server 102 cari be any off-the-shelf Web 
server. In a preferred embodimaivpap seryer^l02 is typicaUy a ^JCeb server such as 
the Apache server or Netscape Communications Corporation's Enterprise™ server. 

Client 106 is a processor requesting resources from origin server 102 on behalf of 
an .end..user. The clie^^^^^ a Web browser such as - 

Netscape Communications Corporation's Navigator™) or a proxy for a user agent. 
Components other than the reflector 108 and the repeaters 104a, 104b, etc., may be 
implemented using commonly available software programs. In particular, tiiis invention 
works with any HTTP client (e.g., a Web browser), proxy cache, and Web server. In 
addition, the reflector lOS.might be fully integrated into the data server 112 (for instance, 
, in a. Web Server). These components n>ight be loosely integrated based on die use of 
extension mechariisms (such as so-called add-in modules) ox tighdy integrated by '~ 
modifying the service component specifically to support the repeaters. 

Resources originating at the origin server 102 may be' static or dynamic. That is, 

the resources may be fixed or diey may be areated by die origin server 102 specifically in 
response to a request. Note that die terms "static" and "dynamic" are relative, since a 
static resource may change at some regular, albeit long, interval. 

Resource requests from die cUcnt 106 to die ori^ server 102 are intercepted by 
reflector 108 which for a given request eidier forwards the request on to the origin server 
102 or coriditionally reflects it to some repeater 104a, 104b, etc. in die network 100. 
That is. depending on die nature of die request by die client 106 to die' origin server 102, 
the reflector 108 either serves the request locally (at the origin server 102), or selects one 
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of the repeaters (preferably the best repeater for the job) and reflects the request to the 
selected repeater. In other \vords, the reflector 108 causes requests for resources from 
origin server 102, made by. client lOG, to be eithe?: served jocally by the origin server 102 
or transparendy reflected to the bestirep.eater 1043^ 104b, etc. The notion of a best 
repeater arid the manner in which the best , repeater is selected are described in detail 
below..; ^ , ...... .v..-^ X - o . . , 

: Repeaters 104 a,,104b,,etc.^arc^^^^^ processors used to service client 

requests, thereby improving performance andreducing costs in die manner described 
herein. Within repeaters 104a, 104b, etc., are any processes or collections of processes 
,that .deliver, resources tq the, cUept ,106. pn.beh^ f^e origin server 102. A repeater 
may include,a repeatet.pac.hc^1lW ^YP^^ V^?>P<?c^ssary transactions with the origin 

:,.ser\^_102.-.. . ; , r . - ■ . ■ 

The reflector 108 is a mechanism, preferably a software program, that intercepts 
requests that would, normally be sent directiy to, the origin server 102, While shown in 
the drawings as separate components, the, reflector 108 and the origin server 102 are 
typically co-located, e.g., on a particular systein such as data server 112. (As discussed 
below, the reflector 108 may. even be a "plug in'\ module that becomes part of the origin 
server 102. ......... 

Figure 1 shows only a part of a network 100: according to this invention. A 
complete operating network consists of any.nimiber of clients, repeaters, reflectors, and 
origin servers. Reflectors communicate with, the repeater network, and repeaters in the 
network communicate with one another, . „ 

Uniform Resource Locators * 

Each location in a computer network has an address which can generally be 
specified as a series of names or numbers. In order to access information, an address for 
that information must be known. For example, on the World Wide Web ("the Web") 
which is a subset of the Internet, the manner, in which information address locations are 
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provided has been standardized into Uniform Resource Locators (URLs). URLs specify 
the location of resources (information, data files, etc.) on the network. 

The notion of URLs becomes even more useful when hypertext documents are 
used. A hypertext dotbmeht is one\n^hich' includes, witKin the document itself, links 
(pointers 6r references) to the aocurtient itself or to other "documents. For example, in 
an on-line legal research system, each case may be presented as a hypertext document, 
'When other cases ^e^cit^d^tiKkstb'tfiose case In this way, when a 

persori is readihg a csiit^^c^ dail follow dte links to 'read the appiropriate parts of cited 
'cases. • ■ I ^ ' 

.r: . L'Oi 'j^^i^^gfl^^ ikt^rtietiW^rieral and the World Wide Web specifically- 
' ' d6(iimerite^caril5e c^ ^Mg &'hM&3X%i^'e6M \dtib^ ks die Hypertext Markup 
Language (HTML). In HTML, a document consists of data (text,' images, sounds, and 
the iik€j] including links to other sections of the saiiie document or to other documents. 
'The lints are generally provided as URLs, and c4h 'be in relative or iabsolute form. 
Relative URLs simply orriit the parts of the URL whiidh aire the^same as for the 
document iricludihg the link, such as^ the address of die document (when linking to die 
same dociiment), etc. In general, a browser program will fill in missing parts of a URL 
using the corresponding parts from the current document, thereby forming a fully 
formed URL including i ifully qualified domain name, etc. 

^ A hyj>ekext d6cuihent rnay cbntain any number of links to other documents, 
and each of thosi ddiei dbciimi^ts'm^ be'oh a different server in a different part of the 
world. For example, a document may contain lihks to documents in Russia, Africa, 
China and Australia. A user viewing that document at a particular client can follow any 
of the links transparendy (i.e,, without knowing where the document being linked to 
actuaiiy resides). Accordingly, the cost (in terms of time or money or resource 
allocation) of following one link versus another' may be quite significant. 

URLs generally have the following form (defined' in detail in T. Berners-Lee et al. 
Uniform Resource Locators (UBJL)y Network Working Group, Request for Comments: 1738, 
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Category: Standards Track, December 1 994, located at , 
"http://ds.internic.net/rfc/rfcl738.Dct", which is hereby incorporated herein by 
reference): ^ - * ; . ; 

scheme:/ /host[:port] furl-path - • . . 

Where "scheme" can be a symbol such as "//?."(for^a file on the local, system), 'Jtp'' (for a 
file on ah aiiohymous FTP file serv.er),/*j6/5i>"i(for a fde on a on a Web server),and 
''telnef " (for a connection to a Telnet^basea/serr/ice). j Other schemes, can also be used 
and 'new schetnes are added every nowiand then. . -The pd^ is optional, the 

system substituting a default port number (depending on the scheme)^ if none is 
' provided. The "host" field maps tc^ ^Lrparticu-kr riet:j/ork address for a particular 
computer. The "url-path" 'is irelatife'tc/the. computer specified in the "host" field. A 
url-path is typically, but not necessarily, the pathname of a file in a web server directory. 

For example, die follo'^ving is a URL identifying a file "F" in the path "A/B /C" 
on 2i computGX 2X.^^wmjp.uspto,goi/*v 

http:/ /www,usptG.gcv/^jB/C/F 

In order to access the file "F" (the resource) specified by the above URL, a 
program (e.g., a browser) running on a user's computet; i^.e., a client computer) would 
have to first locate the computer (i.e., a server computer) specified by the host name. 
I.e., the program would have to \oc2t^ xh(tscxyrcr. ^'pmw,uspto,goi/". To do this, it would 
access a Domain Name Server (DNS), pro^nding the DNS with the host name 
(^'nmmfMspto.goify The DNS acts as a kind of centralized directory for resolving 
addresses from names. If the DNS determines that there is a (remote server) computer 
corresponding to the name ''umm^.uspto.gov^\ itiwill provide the program with an acmal 
computer network address for that server computer. On the Internet this is called an 
Internet Protocol (or IP) address and it has die form "123.345.456.678". The program 
on the user's (client) computer would then use the actual address to access the remote 
(server) computer. 
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The program opens a connection^ to the HTTP server, (Web server) on the 
remote computer ''rmvwMspto.gov'' uses the connection to send a request message to 
the remote computer (using the HTTP scheme). The message is typically an HTTP 
GET request which includes the url-padi of. the requested .resource, "A/B/C/F". The 
HTTP server- receives the request and uses .it.to. access .die resource specified by die url- 
path "A/B/G/F".' The^server retkrns- the tesQUrce over the samexonneerion. 

Thus, : G6nvtotiona%tHin3P:«aien Web resources .at an origin server 

' 102' are processed as-fe)Uows^(feeb "FIGURE- 2),(^ is a description of , die process when 
no' reflector 108 is instaUied.)X^^ .inj> , h . ^ - / 

.7 -r^Ai r.xr-^fcA^Dto^set (€ig^jNetscapeJ5:Navigator).at:.the client receives a resource 

^ ■ ' 'ni4ctehti{ier:^:ei;>a:lSRiiO^£^^ h / ^:>:.;f.^ 

A2. The browser extracts die host (origin server) name from die resource 

identifier, and uses a domain name server (DNS) to look up die network 
(IP) address of the corresponding server, The browser also extracts a 
' V . port number, if one isJprescnt, or uses a default port number (die default 
' ' port^number for http requests is 80). > 

A3. The browser uses the server's network address and port number to 
establish.a connection between the client. 106 and the host or origin 
server 102. . ■ . i 

A4. The client 106 then sends a (GET) request over die connection 
• ^ V - ' ' identifying the requested resource. : , . . . 

A5. The origin server 102 receives the request and, 

A6. locates or composes the corresponding resource. 
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A7. The origin server 102 then sends back to the client 106 a reply containing 
jthe requested resource (or some fortn of error indicator if the resource is 
i-ihavailable):. The;reply. is sent to the cbent ov^^ 
- ; , that on which the, request was received frorn the client. 

A8. ;Tlie.cUentil06 receives the jtepty^ 

, : There- axejmany variations of ,diis basic model... For example, in one variation, 
instea:d of providing the cUent witb the^xesp^i^^^ server can tell the client to 

re-requestitheiresourceibyraAGiti^er^^^ so,.in A7 the server 102 sends back to 

the client 106 a reply called a "REDIRECT" wliich contains a new URL indicating the 
other name. The client 106 then repeats the entire sequence, normally without any user 
intervention, this time requesting the resource identified by the nev/ URL. 

System Operation 

In this invention reflector 108 effectively takes the place of an ordinary Web 
server or origin server 102. The reflector 108 does this by taking over the origin server's 
IP address and port number. In this way, when a client tries to connect to the origin 
server 102, it will actually connect to the reflector 108? The original Web server (or 
origin server 102) must then accept requests at a different network (IP) address, or at the 
same IP address but on a different port number. Thus, using this invention, the server 
referred to in A3-A7 above is actually a reflector 108. 

Note that it is also possible to leave the origin server's network address as it is 
and to let the reflector mn at a different address or on a different port. In this way the 
reflector does not intercept requests sent to die origin server, but can still be sent 
requests addressed specifically to the reflector. Thus the system can be tested and 
configured without interrupting its normal operation. 
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The reflector 108 supports the processing as follows (see FIGURE 3): 
upon receipt of a request, 

Bl The reflector 108 analyzes the request to determine whether or not to 

' reflect the request. To do'tHs, first the reflector determines whether the 
' - '•'sender (cUerit 106) is a browser or a -repeater. Requests issued by 

repeaters must be served locally by the origin server 102. This 
' ' ^detcfthih^dbh' can he friade-by fc^6king up tihe network (IP) address of 

the sender in a list of known repeater network (IP) addresses. 
' r i -4^^ej^riVeiy/*this-deterf^ 

'a'.¥eqdest^^^^^^^ is^from a specific repeater, or 

^'te^daters can^requeSe^i^ the one 

li^sed for ^ordinary eUeritsC' -^"^ - .m/- 



If;:) r::v:i 
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B2' If the request is not fironi a'repeater^ the reflector looks up the requested 
resource in a table (called the "rule base") to determine whether the 
resource requested is "repeatable'-.^ Based *on this determination, the 
. . reflector either reflects t^ie req^^ 

request locaUy.,(]p4, described below). 

, - Xbe=rule base is a list of regular expressions and associated 
, . , ^a^ttributes., . (Regular ^expressions are well-known in the field of computer 
. .science. A sni^iU bibliography of their use is found in Abo, et al.^ 
"Compilers, Principles, technicjvies and tools", Addison-Wesley, 1986, 
pp. 1§7t158.). The resource identifier- (URL) for a given request is looked 
up in.th^ rule base by rnatching it sequentially with each regular 
: 'expression. The firsl^, match ideritifies the attributes for the resource, 
-namely repeatable or local. If there is no match in the .rule base, a default 
attribute is .used. . Each rejaector has . its own rule base, which is manually 
configured by the reflector operator. 
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B3. To reflect a request, (to serve a request locally go to B4), 
as. shown in FIGURE 4, the reflector determines (B3-1) the best repeater 
to reflect the request to, as described in detail below. The reflector then 
creates (B3-2) a new resource identifier (URL) (using the requested URL 
, ;and the.be?t repeater) that identifies the same resource at the selected 

, repeater.-. . 

V , . It is necessary that the reflection step create a single URL 
.... , contairungjthe yRL.ofthe orig^^^ resource, as well as the identity of the 
; V selected repeater,. A specid form of URL is created to provide this 
; j . inforrnation., Thi^ isr.dpri^ by-creating a new URL as follows: 

DL Given a repeater name, scheme, origin server name and path, create a * 
new URL. If the scheme is "http", the preferred embodiment uses the 
following format: 

httpil I <r€peatef>l <servef>/ <path> 
If the form used is other than "http", the preferred embodiment uses the 
following format: . .. 

http:l I <repeatef>l <servef>@proxy~<scheme>@l <path> 

The reflector can also attach ^ MIME type to the request, to cause the 
repeater to provide that MIME type with the result. This is useful 
because many protocols (such as. FTP) do not provide a way to attach a 
MIME type to a resource. The format is 
http'J I <rep€atef> / <server>@proxy^<scheme>:<iyp€>@/ <path> 
This URL is interpreted when received by the repeater. 
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The reflector then sends (B3-3) a REDIRECT reply containing 
this new URL to the requesting client. The HTTP REDIRECT 
command allows the reflector to send the browser a single URL to retry 
the request. 

' B4. To serve a request locally, the request is sent by the reflector to 

("forwarded to") the origin server 102. In this mode, the reflector acts as 
a reverse proxy server. The biigin server 102 processes the request in the 
normal miiiner (A5-A7). The reflector then obtains the origin server's 
3 " ' 'reply to 

resource is in HT^^li'dbt 

one which itself contains resource identifiers. 

B5, If the resource is an HTML document then the reflector rewrites the 
5 HTML document by modifying resource identifiers (URLs) within it, as 

described below. The resource, possibly as modified by rewriting, is then 
returned in a reply to the requesting client 106. 

If the requesting client is a repeater, the reflector may temporarily 
disable any cache-control modifiers which the origin server attached to 
2Q the reply. These disabled cache-control modifiers are later re-enabled 

when the content is served from the repeater. This mechanism makes it 
possible for the origin server to prevent resources firom being cached at 
normal proxy caches, widiout affecting the behavior of the cache at the 
repeater. 

25 

B6. Whether the request is reflected or handled locally, details about the 
transaction, such as the current time, the address of the requester, the 
URL requested, and the type of response generated, are written by die 
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reflector to a local.log file. 
By using a rule base (B2), it is possible to selectively reflect resources. There are 
a number of reasons that certain particular resources cannot be effectively repeated (and 
therefore should not be reflected), for instance: 
5 - the resource is composed uniquely for each request; 

the resource relies on a so-called coolde. (browsers will not send cookies 
- " ; : -.-f PP. ?epeatersA\ath4i^^^ 

• . *e client an^ ^t:^shcs^^o^pp3[iect,p^z petvife.Qava requires that the 

' ' ■ ^' ■ . .service be.mnningoti thp. sgme.macWn.: .tih« applet). 

' .yvi.v^:c?rk,addre§ses.(e.g., requests from clients on the same local area network as the 
rcncctor itself) are nev?r reflected. Also, the reflector may choose not to reflect requests 
because the reflector is exceeding its committed aggregate information rate, as described 
15 below. 

A request which is reflected is automatically mirrored at the repeater when the 
repeater receives and processes the request. 

The coinbinarion of the reflection. process described here and the caching 
process described below effectively creates a system in which repeatable resources are 
20 migrated to and mirrored at the selected reflector, -vvhije non-repeatable resources are 

not mirrored. 

Alternate Approach 

Placing the origin server name in the reflected URL is generaUy a good strategy', 
but it may be considered undesirable for aesthetic or fih the case, e.g., of cookies) certain 
25 technical reasons. 

It is possible to avoid die need for placing both the repeater name and the server 
name in the URL. Instead, a ^'family" of names may be created for a given origin server, 
each name identifying one of the repeaters used by. that server. 
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For instance, if www.example.com is the origin server, names for three repeaters 
might be created: 

wrl.example.com 
Wr2.example.com ' ' * ' 
' * wr3.exahiple:'cdfn ' 

The narrie'^Vi'leii^mpleidoH-^ Be an alias for repeater 1, which might also 

be Icnown by otHer naiiies'sueh^ak*^" and "wrl.example.edu". 

. . i. V ... if "'^-^peater catf deteimihe^by 'which' name' it was addressed, it can use this 
' infomadoA*^ a^sotiates repeater aliias names with origin server 

' namesy to defermiiie "wliiicfe yrrgW' §i^j^<^t i^'Being^adares^ed: For instance, if repeater 1 is 
addressed as wrl.example.com, theh tfie brigiri'Vervef is "Nin;</w;example.com"; if it is 
addressed as "wrl ianotherExample.com", then the origin servier is 
"www^anotherExample.com". ' -' 

The repeater can use two mechanisms to determine by which alias it is 
addressed: 

1. Each alias can be associated with a different IP address. Unfortunately, 
this solution does not scale well, as IP addresses are currendy scarce, and 

^ the number of IP addresses required grows as the product of origin 
' servers and irepeatcrs. 

2. The repeater can attempt to determine the alias name used by inspecting 
the "host:" tag in the HTTP header of the request. Unfortunately, some 

■ old browsers still in use do not attach the "host:" tag to a request. 

Reflectors would need to identify such browsers (the browser identity is 
a part of each request) and avoid this form of reflection. 

How a Repeater Handles: a Request 

When a browser receives a REDIRECT response (as produced in B3), it reissues 
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a request for the resource using the new resource identifier (URL) (A1-A5). Because the 
new identifier refers to a repeater instead of the origjn server, the browser now sends a 
request for the resource to the repeater which processes.a request as follows, with 
reference to Figure 5: : , . . . 

• : . ' c»f *e requesting cUentand the paA^^ 

in the path is an origin server name (as.described above with reference to 
B3). 

. :G2. ; The repeater use.s,aQ.mteniaJ,table. to vcrify.that the origin server belongs 
to a known .•■§absc4ber:^, A subscriber is an entity (e.g., a company) thzt 
publishes resources (e.g., files) via one or more origin servers. When the 
■ entity subscribes^ it is permitted to utilize the repeater network. The 

subscriber tables described below include the information that is used to 
link reflectors to subscribers. . 

If the request is not for a resource from a known subscriber, the 
request is rejected. To reject.a request, the repeater returns a reply 
indicating that the requested resource does not exist. 

C3. The repeater then deteriniaes Whether the requested resource is cached 
locally. If the requested resource is in the repeater's cache it is retrieved. 
On the other hand, if a valid copy of the requested resource is not in the 

• repeater's cache, the repeater modifies.thc incoming URL. creating a 

request tiiat it issues directiy to tlie originating reflector which processes 
it (as in B1-B6). Because this request to the originating reflector is from 
a repeater, the reflector always returns the .requested resource ratiier than 
reflecting the request. (RecaU that reflectors always handle requests from 
repeaters locally.) If the repeater obtained the resource from the origin 
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server, the repeater then caches the resource locally. 
- If a resource is not cached locally/ the cache can query its "peer 

caches" to see if one of them contains the resource, before or at the 
same time as requesting the resource from the reflector/ origin server. If 
' a peer cache respinds- positively in a limited period of time (preferably a 
- : i... '--s'niaU fracti6h"6f i"'§6cbhd)vthe reso^u^^ 

' - cacHe^^ ^'K^^' - 

C4. The repeater then constructs a reply including the requested resource 
; : r.o 0A2 5-^-(^Htfch%as'r6tri^^:d'ft the origin server) and sends 

r n:: -. j^^t fep^y^io^dlfe'^^ '''' 

C5. ' Details abodt the tra^nsacrion, such as the assodated reflector, A 
' ' time, the address of the requester, the URL requested, and the type of 
response generated; are written to a local log file at the repeater. 

'ISIot'e th^t' the bottbm row'of FlGU^ 2 refers to an origin server, or a reflector, 
or a repeater, 'depending bh S^^ URL in step Al identifies. 

. . Rejecting, the. Best Repeater 

If the reflector 108 determines that it will reflect the request, it must then select 
the best fepe^iter to handle > that request (as referred to in step B3-1). This selection is 
performed by did-Best Repeater Selector (BRS) mechanism described here. 

^The goal of the BRS is to select, quickly and heuristically, an appropriate repeater 
for a given/ client given .pnly the network address of the client. An appropriate repeater 
is one which is nbt^too heavily loaded and which is not too far from the client in terms 
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of some measure of network distance. The mechanism used here relies on specific, 
compact, pre-computed data co make a fast decision. Other, dynamic solutions can also 
be used to select an appropriate repeater. 

The BRS relies on three pre-computed tables, namely the Group Reduction 
Table, the Link Cost Table, and die Load Table. These. three tables (described below) 
are computed pff-line and dqwriloaded to each reflector by its contact in the repeater 
network. 

The Group JReducrion Table places eyexy network address into a group, with 
the goal th2:.t addresses in. a.group share r^ative costs, so that they would have the same 
best repeater under varying.conipU^^ is , invariant oyer the members of 

-A.;:, the [group).., ^ .> ... ^ ^ 

^ ; , ' The Link Cost Table is a two dimensional matrix which specifies the current 
cost between each repeater and each.group. Initially, the link cost between a repeater 
and a group is defined as. the "normalized link cost" between the repeater and the group, 
as defmed below. Oyer rime, die table will be updated with measurements which more 
accurately reflect the relative cost of transmitting a file between the repeater and a 
member of die group. The format of die Link Cost Table is <Group ID> <Group 
ID> <link cost>, where the Group ID's are given as AS numbers. 

The Load Table is a one dimensional table which identifies the current load at 
each repeater. Because repeaters may have different capacities, the load is a value that 
represents the ability of a given repeater to accept additional work. Each repeater sends 
its current load to a central master repeater at regular intervals, preferably at least 
approximately once a minute. The master repeater broadcasts the Load Table to each 
reflector in the network, via the contact repeater. 

A reflector is provided entries in die Load Table only for repeaters which it is 
assigned to use. The assignment of repeaters to reflectors is performed centrally by a 
repeater network operator at die master repeater. This assignment makes it possible to 
modify the service level of a given reflector. For instance, a very active reflector may use 
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many repeaters, whereas a relatively inactive reflector may use few repeaters. 

Tables may also be configured to provide selective repeater service to subscribers 
in other ways, e.g., for their clients in specific geographic regions, such as Europe or 

- Asiaf' r- ' : > a. . • i ' = 

5 ■ ' "'Measuring -Load*' - r,. ; . . 

' ' in the presently p^^ emb'odiinrifents;' repeater 1^^^ is- measured in two 

dimensions, namely 
" i;^ Requests receive^^ 

2. " 'tytes 'stht by tfil^'re^^ fiSPT)i ; ' 

10 " ' For eacTS'of thes^^ setting'is set: The maximum 

capacity indicates the point at which the repeater is considered to. be ftiily loaded. A 
higher i^RVT capacity generally indicates a^'faster processor, whereas a higher BSPT 
capacity generaily indicates a wider network pipe. This form of load measurement 
assumes that a given server is ciedicated to the task of repeating. 

15 Each repeater regularly calculates its current RRPT and BSPT, by accumulating 

the number of requests received and bytes sent over a short time interval. These 
measurements are used to determine the^repeater's load in each of these dimensions. If 
a repeater's load exceecis its configured Capacity, an alarm message is sent to the repeater 
network administrator. 

20 The two current load comj>oneiits are combined into a single value indicating 

overall current load. Sihiilarly,' the two rnaximum capacity components are combined 
into a single value'indicating oveirall maximum capacity. The components are combined 
as jfollows: . - . - 

current-load = B X current RRPT +( 1 - B ) X 
25 current BSPT 

max-load = B X max RRPT + ( 1 - B ) X max BSPT" 

The factor B, a value between 0 and 1, allows the relative weights of RRPT and 
BSPT to be adjusted, which favors consideration of eidier processing power or 
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bandwidth. - - , , 

The overall current load and overall maximum capacity values are periodically 

sent from each repeater to the master repeater, where they are aggregated in the Lx)ad 

Table, a table summarizing the overall load for all repeaters. Changes in the Lx>ad Table 

are distributed automaticaDy to each^ reflector. • ■ » : ' ' 

While the preferred erhbcKiiment uses Si tw 

load, any other measure of load can be used. 

Combitiing Link Costs and Load 

* ' Tile BR5 computes the. CE S ' of semeing a given client from each eligible 
repeater. The cost is computed by combining the available capacity of the candidate 
repeater with die cost of the linlc between that repeater and the client. The link cost is 
computed by simply looking it up in the Link Cost table. 

The cost is determined using the following formula: 

threshold ~ K max-load 
capacity = max( max-load - current-load, e ) 
capacity — min{ capacity, threshold ) n: : ^ y 
cost ~ link-cost threshold / cc^acity 

In this formula, ^ is a very small number (epsilon) and JC is a tuning factor initial 
set to 0.5. 'This formula causes the cost to a given repeater to be increased, at a rate 
defined by 2C if its capacity falls below a configurable threshold. , 

Given the cost of each candidate repeater, the BRS selects sll repeaters within a 
delta factor of the best score. From this set, the result is selected at random. 

' The delta factor prevents the BRS from repeatedly selecting a single repeater 
when scores are similar. It is generally required because available information about load 
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and link costs loses accuracy over time. This factor is tunable. 
Best Repeater Selector (BRS) 

The BRS operates as follows, ^with reference .to ElGU^^ 6: , 
5 , ... .Givan a cUent.netwqrk address. and the t^^ 

El. Determine which group the client is in using the Group Reduction 
Table. 

' -E2. Tor each^repeate^hl.^^^ liidc GostTable.and Load Table, determine .that 

' ' ' repeater's combined cost as follows:: . , . r 

' E2a. Determine the maximum and current load on the repeater (using 

the Load Table):' 

E2b. Determine the link cost between the repeater and the client's 
^5 gtoup (using the Link Cost Table). 

E2c. Determine the combined cost as described above. 

E3. Select a small set of repeaters widi the lowest cost. 

20 E4. Select a random member from the set. 

Preferably the results of the BRS processing are maintained in a local cache at 
the reflector 108. 'Thus, if the best repeater has recendy been determined for a given 
client Ci.e., for a given network address), that best repeater can be reused quickly without 
25 ' being re-deterrriined: Since the calculation: described aboye is based on statically, pre- 

compiated tables, if the tables have hot changed then there is no need to re-determine 
the best repeater. 



BNSDCXJID: <WO_9940614AlJ_> 



wo 99/40514 



PCT/US99/0147f 



24 

Deteiminmg the Group Reduction and Link Cost Tables 

The Group Reduction Table and Link Cost Table used in BRS processing are 
created and regularly updated by an independent procedure referred to herein as 
NetMap, The NetMap procedure is run liy exdciidhg seVefiai^ (described below) as 

needed. ' - ' ^ -^-"^ >- t. j ; .. . 

The term Group is used here to refers to iti' IP "aiddress group". 

The term Repeater Group refers to a "CJfdup that ■contains' tKfeTP address of a 
repeater. " ^ ' ' ' ' ''^ ■ ' 

The term //VjA refers to a staticaily ddtinmried cb data 
between two Groups. In a presently preferred implemeniatidn, driis is the minimum of 
the sums of the costs of the links along each path between them. The link costs of 
primary concern here are link costs between a Group and a Repeater Group. 

The term relative link r^?// refers to the lirlk cost relative to otlner link costs for the 
same Group which is calculated by subtracting the minimum link cost from a Group to 
any Repeater Group from each of its link costs to a Repeater Group. 
The term Cost Set refers to a set of Groups ihat sure equivalent in regard to Best 
Repeater Selection. That is, given the Lniforthation available, the same repeater 
would be selected for any of them. ■ f 

The TSietMap procedure first processes input files to create an internal database 
called the Group Registry. These input files describe groups, the IP addresses within 
groups, and links between groups, and come a variety of sources, including publicly 
available Internet Routing Registry (IRR) databases, BGP router tables, and probe 
services that are located at various pointy around the Internet and use publicly available 
tools (such as "traceroute") to sample data paths. Once this processing is complete, the 
Group Registry contains essential informa^tion used for further processing, namely (1) 
the identity of each group, (2) the set of IP addresses in a given group, (3) the presence 
of links between groups indicating paths over which information may travel, and (4) the 



wo 99/40514 



PCT/US99/01477 



25 

cost of sending data over a given link. 

The following processes are 'then performed on the Group Registry file. 

Calculate Repeater Group link costs 

The N5/M^..prpcedure a "link cost" for txansmission of data between 

each Repeater Group and each Group in die Group Registry. This overall link cost is 
defined, as the Jninifr^um. cost of any ^ath between the two groups, where the cost of a 

. pathis.eqiial, to the s^uip^of the costs of the individual links in die path. The link cost 
algorithm presented below is essentially the same as algorithm #562 from ACM journal 
Transactions on Mathematical Software: "Shortest Padi From a Specific Node to All 
Other Nodes in a Network" by U, Pape, ACM TOMS 6 (1980) pp. 450-455, 

.http://www.netlib.org/toms/562. 

In this processing, the term Repeater Group refers to a Group diat contains the 
IP address of a repeater. A group is a neighbor of another group if the Group Registry 
indicates that there is a link between the two groups. 

For each targ;et , Repeater Group T: 

. Initialize the link cost between T and itself to zero. 
• Initialize the link cost between T and every other Group to infinity. 

- • ' i Create a list L that >vill contain Groups that are equidistant from the target 

Repeater;GroupT. . , . 
9 ' Initialize the list^L to contain, just the target Repeater Group T itself. 
• While die list L is not empty: 

* ' • Create'ah empty list L- of neighbors of members of the. list L. 

• * For each Group G in the list L: = ' 

• For each Group N that is a neighbor of G: * ' 

• Let cost refer to the sum of the link cost between T and 

G, and the link cost between G and N. 
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The,cost;between T and G was determined in the 
previous pass of the algorithm; the link cost between G 
and N is from the Group Registry, 
• If cost is less than the link cost between T and N: 

K . ; . S^t die Jink cost between T and N to cost. 
, ; - ; , AddNtoLM 
- r-:-..,.^_ f . ^.Se^tLtofc'. , 

Calculate Cost Sets 

:v . ; ^ fif.5^^?^^I^^^-*^? :^9^y^^^,^ with respect to Best Repeater 

Sekction. That is, given die information ava^^ the same repeater would be selected 
for any of them. 

The "cost profile" of a Group G is defined herein as the set of costs between G 
and each Repeater. Two cost profiles are said to be equivalent if the values in one 
profile differ from the corresponding values in the other profile by a constant amount. 

Once a client Group is known, the B^st Repeater Selection algorithm relies on 
the cost profile for information about the Group. If two cost profiles are equivalent, the 
BRS algorithm would select the same repeater given either profile. 

A Cost Set is then a set of groups that have equivalent cost profiles. 

The effectiveness of this method can be seen, for example, in the case where all 
paths to a Repeater from some Group A pass through some other Group B. The two 
Groups have equivalent cost profiles (and are therefore in- the same Cost Set) since 
whatever Repeater is best for Group A is also going to be best for Group B, regardless 
of what path is taken between the two Groups. 

By normalizing cost profiles, equivalent cost profiles can be made identical. A 
normalized cost profile is a cost profile in which the minimum cost has the value zero. 
A normalized cost profile is computed by finding the minimum cost in the profile, and 
subtracting that value from each cost in the profile. 
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Cost Sets are then computed using the following algorithm: 

• For each Group G: 
' • ' 'Calculate the nOrrhalizeid cost profile for G 

5 ' ^ ' o I^bk for a Cost Set with thd'sam'e norm 

^ if slich as set is'foun'd/a'd^ G to the existing Cost Set; 

• otherwise, create a new Cost Set with the calculated normalized cost profile, 
containing only G. 

10 ■ ' ^ Tlie aigonthm^ife^^ C!ost Sets employs a hash table to reduce the time 

ncccssaiy to determine wKetlier the 'desirecJ i5>sT: ^et already exists. 'ITnie''kash table uses 
a hash value computed from cost profile of G. 

Each Cost Set is then numbered with a unique Cost Sent Index number. Cost 
Sets are then used in a straightforward manner to generate the link Cost Table, which 
15 gives the cost from each Cost Set to each Repeater. 

As described teldw, ttie Group Reduction Table maps every IP address to one 
of these Cost Sets. 

Build IP Map . r ^ 

* The IP Miap is si' Ported list 6f records which map IP address ranges to link Cost 
20 Table keys/"Th^ f6rnrtat'6f the IP ^^^^ 

<base IP addi:ess>' <max IP addi:ess> <Link Cost Table key> 
where IP addf fesses are presehdy represented by 32-bit integer^. ^Thelslntries are sorted by 
descending base address, and by' ascending maximimi address ariibrig eiqual base 
addresses, and by astending^ Link Cost Table key among equal base addresses and 
25 maximum' addresses. Note that ranges rhay overlap,^' ' 

The NetMap procedure generates ari intermediate IP map containing a map 
between IP address ranges and Cost Set numbers as follows: * 
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• For each Cost Set S: 

• For each Group G in S: 

• For each IP address rangejin Gr 

. . : . ; •! ;. Add a_|ripJe (low.addres.s, high address, Cost Set nuniber o 
■ ' ; V ■ ; , '^y'io the IP map. ^ 

The IP map file is, then sprted b)^ descei?ding base address, and by ascending 
maximum address among equal basp addresses, and by ascending Cost Set number 
among equal base addresses and maximum addresses. The sort order for the base 
address and maximuin address minimizes the time to build the Group Reduction Table 
and produces the proper results for overlapping entries. 

,Pi"^y.*e -Zy^/M.?Z) procedure creates the Group Reduction Table by processing 
die sorted IP map. The Group Reduction Table maps IP addresses (specified by ranges) 
into Cost Set numbers. Special processing of the IP map file is required in order to 
detect overlapping address ranges, and to merge adjacent address ranges in order to 
minimize the size of the Group Reduction Table. 

An ordered list of address range segments is, maintained, each segment consisting 
of a base address B and a Cost Set number N, sorted by base address B. (The 
maximum address of a segment is the base address of the next segment minus one.) 

The following algorithm is used: 

• IniriaUze the list with the elements [-infinity, NOGROUP], [+infinity, NOGROUP]. 

• For each entry in die.IP map, in sorted order, consisting of (b,m,s), 

• Insert (b, m; s) in the list (recall diat .IP.map entries are of the form 
(low address, high address Cost Set number of S)) 

• For each reserved LAN address range (b, m): 
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Insert (b, m, LOCAL) in the list, 

• For each Repeater at address a: 

Insert (a, a, REPEATER) in the list: 

• For each segment *S in the ordered list: 
■ < * • . ' o Mergd S V)^th foUo\^ 

• Create a Group Re'ductibri' Table entry with base address from the 
base address of S, 

' ' ' ' ' • max address — hfext Segment's* base — T, 

© ' giroup l5b = C^ 

A reserved LAN address range is an address r^iijge reserved for use by LANs 
which should not appear as a global Intemet address. lliOCAL is a special Cost Set 
index different from all others, indi'cadng that the range maps to a client which should 
never be reflected. REPEATER is a special Cost Set index different from all others, 
indicating that the address range maps to a repeater, NOGROUP is a special Cost Set 
index different from all others, indicating that this range of addresses has no known 
mapping. 

Given (B\ M, N), insert an entry in the ordered address list as follows: 
Find the last segment (AB, AN) for which AB is less than or equal to B. 
If AB is less than B, insert a new segment (B, N) after (AB, AN). 
Find the last segment (YB, YN) for which YB is less than or equal to M. 
Replace by (XB, N) any segment (XB, NOGROUP) for which XB is greater 
than B and less than YB. 
' If YN is not N, and either YN is NOGROUP or YB is less dian or equal to B, 

• Il^'t (ZB, ZN) be the segment foUowng (YB, YN). 

If M-f 1 is less than ZB, insert a new segment (M+1, YN) 
before (ZBvZN).. ^ - 
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Replace (YB, VK) by (YB, N). 



Rewriting HTML Resources 

As explained above with reference to Figure3 (B5). when a reflector or 

■ repeater serves , resource wluch itsdf indudes.resource identifiers (e.g. a HTML 

■ resource), that resource is modified (,,^ritt,n) .o pre.reflect resource idendfiers (URLs) 
of repeatable resources tha, ap,pear,.in,th, .esou.rce. Rewriting ensures that when a 

• browser requests repeatable resource, idepdfied by the requested resource, it gets them 
from a repeater -ti^out going,bad,,a,h,arigin;s^er,,,b^t ^h^ ,,^,,3, 
repeatable .sources idendfied by.,h, ,™.,so,^e. it will go ^cdy to the or^n 
serve. w.d,out this opdmi^ation. the browser would either make all requests at the 
ong,n server (increasing traffic at the.origin server and necessitating far more 
redirections from d.e otigin server), or it would make all requests at the repeater (causmg 
the repeater to redundantiy request and copy resources, which could not be cached 
mcreasmg the overhead of serving such resources), 

Rewriting requires that a repeater has been selected (as described above wid. 
reference to the Best Repeater Selector)., Re^^^ng uses a so-called BASE directive 

l^e BASE directive lets the HmLide,tify a different base server. (H^e base address is 

normally the address of die HTML resource.) 

Rewriting is performed as follows: - . . 



Fl. 



ABASEdirectiveis added at die beginningof the HnviL resource or 
modified where necessary. Normally, a browser interprets relative URLs 
as being relative to, the default base address, namely, Ae URL of the 
HTML resource (page) in which they are encountered. The BASE 
address added specifies the resource at the reflector which originally 
served die resource. This means diat unprocessed relative URLs (such as 
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those generated by Javascript™ programs) will be interpreted as relative 
to the reflector. Without this BASE address, browsers would combine 
relative addresses with repeater names to create URLs which were not in 
the form required by repeaters" (as describee! above in step Dl). 

F2? ' -Thd rewritfer ^ddnti'fies direc 

' ' ' edhiairiittg URiLsv'Tf thfe irev^ritet is running in a reflector, it must parse 

* ' * ■ ' ■ * •'•the' HTML £a'6'teJ identify thfes0 directives. : - 
r .. ... . -' -i-^if it is iiiritiiiig in rcwriter may have access to pre- 

;irv >v c vi v^^j^^^v^^^ij^f^ of each URL (placed in 

- tHe-HTMLmenn^t^F4)y'-' fe^-^ ■ k ^- 

F3. For each URL encountered iii the resource to be re-written, the rewriter 
must determine whether the URL is repeatable (as in steps B1-B2). If 
the URLis not repeatable^it is notn>odified. On the other hand, if the 
URL is repeatable, it is modified to Prefer to the selected repeater. 

F4. After aU URLs Have* been identified and modified, if the resource is being 
served to- a repeater,' a table is appended at the beginning of die resource 
that identifies the location and content of each URL encountered in the 
resource. (This step is an optimization which eliminates the need for 
parsing HTML resources at the repeater.) 

F5. "Once all changes have been identified, a new length is computed for the 
' ' resource (page). The length is inserted in die HTTP header prior to 
* ' serviiig the resource. / • 

An extension of HTML, known as XML, is currendy being developed. The 
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process of rewriting URLs wilJ-be similar for XML, with Wme differences in Ae 
mechanism diat parses the resource and identifies embedded URLs. 

- ■' ■■ ■ : . . . 

Ha^dUng Non-HTTP Prctoeols : 

This invention makes it pc^ssible tb^ reflect references to resources that are served 
by protocols other d.an HTTP, for instance, the File Transfer Protocol (FTP) and 
audio/video stream protocols. However, many protocols do not provide the ability to 
redirect requests. It is. hov^ver. possible to redirect references before requests are 
actually made by rewriting URLs embedded in HmL pages.' lie fbllowing 
mbdificitioris to the^tb^re ai^M s,ijij>oiitius cajiability. 

In F4. therewriter rewrites tiki^ for servers^ ^ 
configurable table of coopefating origin server or so-called co^servers. The reflector 
operator can define this table to include FTP servers and other servers. A rewritten 
URL that refers to a non-HTTP resource takes the form: " 



' resource 

an 



http://<repeater>/<origm server>@,proxy=<scheme>[:<type>]@/, 
where <scheme> is a supported protocol name such as "ftp". This URL format is 
alternative to the form shown in B3. ' 

In C3, the repeater looks for a protocol embedded in the arriving request. If a 
protocol is present and the requested resource is not already cached, the repeater uses 
the selected protocol instead of the default HTTP protocol to request the resource when 
serving it and storing it in the cache. 



System ConfigTiration and Management = . 

In addition to the processing described above, the repeater network requires 
various mechanisms for system configuration and network management. Some of Aese 
mechanisms are described here. 
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Reflectors allow their operators to synchronize repeater caches by performing 
publishing operations. The process of keeping repeater caches synchronized is 
described below. Publishing indicates that a resource or collection of resources has 
changed. 

Repeaters and reflectors participate in various "ty pes of log processing. The 
results pf logs collected at.repeaters are .collected and merged with logs collected at , 
reflectors, -as, described belcw.^-. ...v : W ^ ' i . . -f- • % ' 

^ Adding Subscribers to the Repeater Network 

When a new subscriber is added to the network, information about the 
subscriber is entered in a Subscriber Table at the master repeater and propagated to all 
repeaters in the network. This information includes the Committed yiggregate Injommtion 
Kate (CAIR) for servers belonging to the subscriber, and a list of the repeaters that may 
be used by servers belonging to the subscriber. 

Adding Reflectors to the Repeater Network 

When a new reflector is added to the network, it simply connects to and 
announces itself to a contact repeater, preferably using a securely encrypted certificate 
including the repeater's subscriber identifier. 

. The contact repeater determines whether the reflector network address is 
permitted for this subscriber. If it is, the contact repeater accepts the connection and 
updates the reflector with all necessary tables (using version numbers to determine 
which tables are out of date). 

The reflector processes requests during this time, but is not "enabled" (allowed 
to reflect requests) until all of its tables are current. ■ - 
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Keeping Repeater Caches Synchronized 

Repeater caches are coherent, in the sense that when a change to a resource is 
identified by a refiector, aU repeater caches are ribtified, and accept the change in a single 
transaction. " ' ' - . , . ■. • -■ 

Only the idendfier of the chan'ged" Resource '(ahd hot the entire resource^ is 
transmitted to the repeaters; Sie i'd&tifier'is lised- tb' effectively invalidate the 
cbrrbsporiding cached resource ^t thi^ rfepektfe^^ tliis pf bce^ is fai i^iore efficient than 
broadcasting die content of die changed resource to each repeater. 

A repeater wQI load the newly ifibaified re^bufce the next time it is requested. 
^ A resbdrce chafege is 'ia^ritifiM^^t'-ffife'Jerlgctbr either ^-m^^ by the operator, 
' or thrbugh'^a ^cHpt Vhdii mgsl^^ *i?istee;d''bri" tHe"se^^^^^^ automatically- through a 
change detedrioh'iriechanism (^:g, a separate process that checks regularly for changes). 

A resource change causes d:ie reflector to send an "invalidate" ttiessage to its 
contact repeater, which forwards die message to die master repeater. The invalidate 
message contains a Ust of resource identifiers (or regular exf>ressions identifying patterns 
of resource identifiers) diat have changed. (Regular expressions are used to invalidate a 
directory or an entire server.) The repeater rietwbrk uses a twolphase commit process to 
ensure that all repeaters correcdy invalidate a given resource. 

The invalidation process operates is follows: 

The master broadcasts a "phase 1" iAvalidation request to aU repeaters indicating 
die resources and regular expressions describing sets of resources to be invalidated. 

When each repeater receives die phase 1 message, it first places die resource 
identifiers or regular expressions into a Ust of re^burcre identifiers pending invalidation. 

Any resource requested Cm C3) diat is in die pending invalidation list may not be 
served from die cache. This prevents die cache from requesting die resource from a 
peer cache which may not have received an invalidation notice. Were it to request a 
resource in diis manner, it might replace the newly invaUdated resource by die same, 
now stale, data. . . 
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The repeater then compares the resource identifier of each resource in its cache 
against the resource identifiers and regular expressions in the list. 

Each match is invalidated by marking it stale and qj^donally removing it firom the 
cache. This means that a fiiture request for the resource will cause it to retrieve a new 
5 ^ copy of the resource jfi:pmtl;ip,rj?f^ , . . ^. . , 

, . . . . .^5<7hen tjbie. preppater^ h^as iqqrnplf ted ^th inyaJddation^ it returns an acknowledgment 
- . to the. master. .The n^asterpwait^.vant^^all^j^^ invalidation 

request. i. , ^ ^yy^--^:'- - .-rv^-;. _ 

... If; a. repeater, fs^il^ from 

10, . ... ; .thejg^2^£€^.r^^ flusli its entire cache, which 

. will eliminate any .consist^ency, ptpbtem., .^To, ayPf4 fl^n^^S. ^^W?^. -^5^^'. master 
could keep a log of all inyalidarions performed, sorted by jdate, and flush only files 
invalidated since the last time the reconriecting repeater successfully completed an 
invalidation, In the presently preferred embodiments this is not done since it is believed 
15. that repeaters will seldom disconnect.) 

. When all repeaters have acknpwledged invalidation (or timed out) the repeater 
broadcasts a "phase 2" invalidation requesf: to all repeaters. This causes the repeaters to 
remove the corresponding resource identifiers and regular expressions from the list of 
resource identifiers pending invalidation. 
20. In another. embodiment, the invalidation request will be extended to allow a 

"server push'*. In such requests, afi:er phase 2 of the invalidation process has completed, 
the repeater receiving. the invalidation . request will immediately request a new copy of the 
inyalidatcjd resource to. place in its. cache. 



25 Logs and Log Processing 

Web server activity logs are fundamental to monitoring the activity in a Web site. 
This invention creates "merged logs" that combine the activity at reflectors with the 
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activity at repeaters, so that a single activity log appears at the origin server showing all 
Web resource requests made on behalf of that site, at any repeater. 

This merged log can be processed by standard processing tools, as if it had been 
generated locally. 

On a periodic basis, the master repeater (or its delegate) collects logs from each 
repeater. The logs collected are merged, sorted by reflector identifier and timestamp, 
and stored in a datied file on a per^teflector Basis; ' TKe nierged log for a given reflector 
represents die activity ofjaU repeaters on behalf o£t^^^ reflector. On a periodic basis, as 
configured by the reflector operator, a reflector contacts the mas^ter repeater to request 
its merged logs. It downloads these and merges them widi.its,loc?jy[y main logs, 
sorting by timestamp. rTlie xes^^^^^^^^ all activity on behalf of 

repeaters aiid the given' reflector.- , , ^ i 

Activity logs are optionally ^extended with information important to the repeater 
network, if die reflector is configured to do so by the reflector operator. In particular, 
an "extended status code" indicates information about each request, such as: 

1. request was served by a^ reflector locally; 

2. request was reflected to a repeater;* ^ , 

3. request was served by a . reflector to a {Repeater;* 

4. request for non-repeatable resource was served by repeater;* 

5. request was served by a repeater from die cache; 

6. request was serv^ed by a repeater after fiyOirig cache; 

7. request pending invalidation was served by a repeater. , 

(Tne activities marked widi "*".represent intermediate states of a request and do not 
normally appear in a final activity log.) ^ 

In addition, activity jogs contain a duration, and extended precision timestamps. 
The dmation makes it possible to analyse the time reqiaired to ser\^e a resource, the 
bandwidth used, die number of requests handled in parallel at a given rime, and other 
quite useful information. The extended precisipn timestamp makes it possible to 
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accurately merge activity logs. 

Repeaters use the Network Time Protocol (NTP) to maintain synchronized 
clocks. Reflectors may either use NTP or calculate a time bias to provide roughly 
accurate rimestamps relative to their contact repeater. 

Enfotcing CornmittediJ^^ . ^ 

' The fep^iitei: netwOrk^Tridtiifors aiid^to the a^egate rate at which data is 
- Served -oti B^Hklf of i^giveh subscriber by all' f epeaters JiXhisi mechanisnx provides the 

ff -^jrovide^a^tiWan^'of-pl^ 

2. provides a means for estiinaring^and Observing capadty- at repeaters^ 
' ' '3! provides a meafris for preventing clients o£ a busy site from limiting access to 
other sites.* - • f 

Foreach subscriber; a "threshold a^egate information rate" (TAIR) is 
configured and maintained at th6 master repeater. This threshold is not necessarily the 
committed rate, it may be a rhultiple of committed rate, based on a pricing policy. 

Each repfelter measures the information rate component of each reflector for 
which it serves resources, periodically (typically about once a minute), by recording the 
number of bytes transtnitted^dri behalf of that reflector each time a request is delivered. 
The table thus created^ is sent td the master-repeater once per period. The master 
repeater corhbines the tables from each repeater, summing the measured information of 
each reflector over all repeaters that serve resources for that reflector, to determine the 
"measured aggregate information rate" (MMR)- for each reflector.. 

' " - If the MAIR for a ^ven reflector is greater dian the TAIR for that reflector, the 
' * MAIR is transmitted by the master to all repeaters and to the respective reflector. 

When a reflector receives a request, it determines whether its most recendy 
calculated MAIR is greater dian its TAIR. If 'this is die case, the reflector 
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probabilistically cecides whether to suppress reflection, by serving the request locally (in 
B2). The probability of suppressing the reflec tic n increases as an exponential function 
of the difference between the MAIR and the CAIR. , 

Serving a request locally during a peak period may strain the local origin server, 
but it prevents this subsciiber from taking rnorejthan aUocated bandwidth from the 
shared repeater rietwork. ^ ; : ^ _ • , , . . 

When a repeater receives a request for a given subscriber (in C2), it determines 
whether the subscriber is running near its threshoM 'a rate. If this is 

the case, it probabilistically decides whether to reduce its load by redirecting the request 
back to the reflector. The probability increases exponenri^Uy.as -tiie:^^^ aggregate 
Iriforrriatioii riate apprdache^^^^ , ; . . r 

If a request is reflected back tb a reflector, a special character string is attached to 
the resource identifier so that the receiving reflector v/iU not attempt to reflect it again. 
In the current system, this string has the form 

"src— overload", * 

The reflector tests for this string in B2. ' > 

The mechanism for limiting Aggregate Information Rate described above is 
fairly coarse. It limits at the level of sessions with clients (since once a client has been 
reflected to a given repeater, the rewriting process tends to keep the client coming back 
to that repeater) and, at best, individual requests for resources. A more fine-grained 
mechanism for enforcing TAIR limits within rcj^caters operates by reducing the 
bandwidth consumption of a busy subscriber when other subscribers are competing for 
bandwidth. . - . ^ , 

The fine-grained mechanism is a form of dat?y''rate shaping". It extends the 
rhechanism that copies resource data to a connection when a reply is being sent to a 
client. When an output channel is established at. the time a request is received, the 
repeater identifies which subscriber the channel is operating for, in C2, and records the 
subscriber in a data field associated with the channel. - Each time a "write" operation is 
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about to be made to the channel, the Metered Output Stream first inspects the current 
values of the MAIR and T.MR, calculated above, for the given subscriber. If the MAIR 
is larger than the TAIR, then the mechanisrti pauses briefly before performing die write 
bperatibri: * The length of- the p^use is: proportional to the amount the MAIR exceeds die 
TAIR. The pause ensures that tasks sending other resource? to .Qther. clients, perhaps on 
behalf of other subscribers, will have an opportunity to send their ^ata. - . 

Repeater Network Resilience 

- - ' ' - 'The ^repeater network is capjible, of recovering when a repeate^.or network 
'ConricctiGW-£adlstx^/:;.o 7;v/ ^ . . ... • 

A repeater cannot operate unles^.i: fr:cqrvnect:ed .to the rn^s^er repeater. The 
master repeater exchanges critical information, with other repeaters, including 
informatiori.about repeater load, a^egate information rate, subscribers, and link cost. 

If a master fails, a "succession" process ensures that anpdier repeater will take 
over the role of master, and the network as a whole will remain operational. If a master 
fails, or a connection to a master fails through a network probleni, any repeater 
attempting to communicate vAth the master will detect die failure, eidier dirough an 
■ indication from TCP/IP, or by timing out from a regular ^'heartbeat" message it sends to 

the master. ■ ; • 
' When any repeater is;disconnected from its master, it immediately tries to 

, reconnect to a series of potential masters .based on a configurable, file called its 

"succession list". • ; ' 

The repeater tries each system on die list in succession until it successfully 
connects to a master: If in dds process, it comes to its own name, it takes on die role of 
master, and accepts connections from other repeaters. If a repeater which is not at the 
' top of the list becomes .the master, it is called die "temporary master". 

A network partition. may cause two groups of repeaters each to elect a master. 
When the partition is corrected, it is necessary diat the more senior master take over die 
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network. Therefore, when a repeater is temporary master,!?: regularly tries to reconnect 
to any master above it in the succession list. If it succeeds. it immediately disconnects 
from aU of the repeaters connected to it \XTien diey retry: their succession lists, they will 
connect to the. more seruor master . repeater. . . 

To prevent losses of data, a temporary master does not accept configuration 
changes and does not process log files. In order to take on these tasks, it must be 
informed that it is primary master fey mknukl mi^ successor fist Each 

repeater regularly reloads its suCceissGf list to' deteimirle whether it should change its idea 
of who tbe master is. ' ' c .; ;;i , i? ?. .t. • 'i 

If a repeater is discoiinected'ff6Hr;£he-masTer,dt:f^ cache 
wheh it r'ecSrinects^ to' the triaste£"'?ni&ini'stfei: 'c^ a list of recent cache 

iiivalidatioris and serid to the repeater any inv£ilidations it was not able to process while 
disconnected. If diis list is not available for some reason (for instance, because the 
reflector has been disconnected too long), the reflector must invalidate its entire cache. 

A reflector is not permitted to reflect requests unless it is connected to a 
repeater. The reflector reUes on its contact repeater for critical information, such as load 
and Link Cost Tables, and current aggregate ihforma:tiOn irate. A.feflector that is not 
connected to a repeater caii continue to receive requests and handle them locally. 

If a reflector loses its connection widi a repeater, due to a repeater failure or 
network outage, it continues to operate while it tries to connect to a repeater. 

Each time a reflector attempts to connect to a repeater, it uses DNS to identify a 
set of candidate repeaters given a domain name that represents the repeater network. 
The reflector tries eich repeater in this set until it rhakes a successful, contact. Until a 
successfiil contact is made, die reflector serves all' requests locally; When a reflector 
connects to a repeater, the repeater can teU it to attempt to. contact a different repeater; 
this allows the repeater network to ensure that no individual repeater has too many 
contacts. 
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When contact is made, the reflector provides the version nunnber of each of its 
tables to its contact repeater. The repeater then decides. which tables should be updated 
and sends appropriate updates to the reflector. Once all tables have been updated, the 
repeater notifies the reflector that it may now start reflecting requests. 




!; Repeaters are inte^itionally d^^igned sp^that aqy proxy cache can.bp psed as a 
component within them. This is possible because the repeater receives P 1 i F requests 
andreor^erts:Aem' to- a form .refi<i^g^f d :bX:?he proxy cache.^,^ . ; 

; r : . ©n the .other :hand, seYeral .fsio^i^catipa^ to. ra, standard proxy jCache have been or 
may be made as optimizatiorxs. -This includes, in particular, the ability to cpnveniendy 
invalidate a resource, the ability to support cache quotas, and the ability to avoid making 
an extra copy of each resource as it passes from the proxy cache through the repeater to 
the requester, i 

In a preferred embodiment, a proxy cache is used to implement C3. The proxy 
cache is dedicated for use only by one or more repeaters. Each repeater requiring a 
resource from the proxj'^ cache constructs, a proxy request from the inbound resource 
request. A normal HTTP GET request to a server contains only the pathname part of 
the :URL — the scherne -and ser\'er name are implicit. (In an HTTP GET request to a 
repeater^ the pathname part of the URL includes the name of the origin server on behalf 
of which the request is beiiig made, as described above.) However, a proxy agent GET 
request takes an entire URL. , Therefore, the repeater must construct a proxy request 
containing the entire URL from the path portion of the URL it receives.. Specifically, if 
= the incoming request takes. the form: _ 

GET I <origin servef> I <path> 
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then the repeater constructs a proxy request of the form: 

GET http:/ / <ongin servef> / <path> 
ana if the incoming- request takes; the fbrm:^' , vn ^ , 



uhcntherepeaterconstructs a proxy request of the form: , 



- <schem€>:/ 1 <origin,^ > , - , 

Cache Control 

HTTP replies contain directives called cache control directives, which are used 
to indicate to a cache whether the attached resource may be cached and if so, when it 
should expire. A Web site administrator configures the Web site to attach appropriate 
directives, ©ften, the administrator will not faiow how long a page will- be fresh, and 
must define a short expiration time to try to prevent caches from serving stale data. In 
many cases, a Web site operator will indicate a. short expiration time only in order to 
receive the requests (or hits) that would oAerwise be masked by the presence of a cache 
This is known in Ae industry as "cache-busdng- AlAough some cache operators may 
consider cache-busting to be impolite, advertisers who rely on this information may 
consider it imperative. 

When a resource is stored in a repeater, its cache directives can be ignored by the 
repeater, because the repeater receives explicit invalidation events to determine when a 
resource is stale. When a proxy cache is used as the cache at the-repeater. the associated 
cache directives may be temporarily disabled. However, they must be re-enabled when 
the resource is served from tl,e cache to a client, in order to permit the cache-control 
policy (including any cache-busting) to take effect. 
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The present inventdcn contains mechanisms to prevent the proxy cache within a 
repeater from honoring cache control directives, while permitting the directives to be 
served from the repeater. 

When a reflector serves a resource to a repeater in B4, it replaces all cache 
directives by modified directives thatiare ignored by the repeater proxy cache. It does 
this by prefixing a distinctive string such as "wr-" to the beginning of the HTTP tag. 
Thus, "expires" becomes iyr-expires", arid "cache-contcor. becomes 
"wr-cache-control". 'This prevents the^proxy cache itself from honoring the directives. 
When a repeater serves a resource in C4, and the requesting client is not another 
repeater, it searches for HTTP tags begirming with "wr-" and removes the "wr-". This 
allows the clients retrieving the resource to honor the directives. 

. Resource Revalidation 

' • There are several cases where a resource may be cached so long as the origin 
' ser\^eris consulted each. rime it;is,sefved: In one case, the request for the resource is 
attached to a so-called "cookie". The origin server must be presented with the cookie to 
recofd the request and determine whether the cached, resource may be served or not. In 
another case, the ^request for the resource is attached; to an authentication header (which 
identifies the requester with a user id and password). Each new request for the resource 
must be tested at'the origin server to assure that the requester is authorized to access the 
resource. " . 

The HTTP 1 A specification defines a reply header .tided "Must-Revalidate" 
which allowsvan' origin. server to instruct a proxy cache to "revalidate": a resource each 
time a request is received. Normally; this -mechanism is oised to determine whether a 
resource is still -fresh. In^the presentinventioni Must-Revalidate makes it possible to ask 
an origin server to validate;a request that is otherwise served from a repeater. 
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The reflector rule base contains information that determines which resources 

...may be repeated but must be revalidated each time they are served. For each such 

resourccin B4. the reflector attaches a Must.Reyalidate header. Each time a request 

comes to a repeater for a cached resource marked witb.a Must-Revalidate. header, the 

requestjs Warded to, the reflector Jor validation, prior- to serving the requested ' 
, . resource. ....... 

Cache Quotas ■'' ^ ■■-■i'^'-''' -■■'■'■i^-^- ^ r-- ■ ..; 

The cache component of a repeater is shared amo^^ those subscribers that 
reflect clients to that: repeater. In or<fer to aU^ subscribers fai^ storage 
faciUties, the cache may b^ " 

Normally, a proxy cache may be configured ^vith a disk space threshold T 
Whenever more than T bytes are stored in the cache,^the cache attempts to find 
resources to eliminate. 

TypicaUy a cache uses the least-recendy-used (LRU) algorithm to determine 
wWch resources to eliminate; more sophisticated caches use other algorithms. A cache 
may also support several threshold values~for. instance, a lo^ver threshold which, when 
reached, causes a low priority background process to remove items from the cache, and 
a higher threshold which, when reached, prevents resources from being cached until 
sufficient firee disk space has been reclaimed. 

If two subscribers A ^od B share a cache, and more resources of subscriber A 
are accessed during a period of rime than resources of subscriber B, then fewer of B's 
resources wiU be in the cache when new requests arrive. It is possible that, due to the 
behavior of A, B's resources will never be .cached when they are requested. In the 
present invention, this behavior is undesirable. To address this issue, the invention 
extends die cache at a repeater to support cache quotas. ... 

The cache records the amount of space used by each subscriber in Dg, and 
supports a configurable threshold Ts for each subscriber. 
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Whenever a resource is added to the cache (at C3),; the value D^^ is updated for 
the subscriber providing the resource. If Ds is larger than Tc;, the cache attempts to find 
resources to eliminate, from arhong those resources associated with subscriber S. The 
cache is effectively partitioned 'in to sefJarate area^s f6r each subscriber. 

* ' • The original 'thrSsfiol^Tis'soD supported: If the Siifn of reserved segments for 
each subscriber is smaller than the total space reserved in the cache, the remaining area 
is "common" and subject to competition among subscribers. 

Note, this mechanism might be implemented by modifying the existing proxy 
caclie discussed alcove, or it might also be implemented witliout modifying the proxy 
' caciie— iiF tfie prl^s^ it possible for an external program to obtain a 

hst of resources in the cache, and to remove a given resource rrom the cache. 

Rewriting from Repe;aters ; , r. . , ^ 

When a repeater receives a request for a resource, its proxy cache hiay be 
configured' td determine whether a* peer cache contains the requested resource. If so, 
' the proxy cache obtains the resource from* the peer cache, which can be faster than 
obtaining it from the bri^h server (the reflector). However, a consequence of this is that 
rewritten HTML' resources retrieved from the peer cache would idenuf^' the wrong 
repeater. Thus, to'allow for cooperating proxy caches, resources are preferably rewritten 
at the repeater. . . . 

When a resource is rewritten for a repeater, a special tag is placed at the 
beginning of the resource. ' When constructing a reply, the repeater inspects the tag to 
detertninc whether the resource indicates that additional rewriting is necessary. If so, the 
repeater modifies the res6urce by replacing references to the old repeater with references 
• to the new repeater. - ' * * 

It is only necessary to perform this rewriting when a resource is served to the 
proxy cache at another repeater. 
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Repeater-Side Include 

Sometimes, an origin server constructs a custom resource for each request (for 
instance, when inserting an advertisement based 6n the history of the requesting client). 
In such a case, that resource must be served locally rather than repeated. GeneraUy, a 
custom resource contains, along with the custom information,' text and references to 
other, repeatable, resources. - ■ 

process that assembles a "page" from a text resource and possibly one or 
more image resources is performed ty-iKc^eb broii^r&^ted by HTML. However, 
it is not possible using HTML to cause a browser to assemble a p%e using t^t or 
directives from a separate resource. Therefore, custom resources of^n necessarily 
contain large amounts of' stioc text'that ^Ji^oill^ be repeatable. 

To resolve diis potential inefficiency; repeaters recognize a special directive 
called a "repeater side include". This directive makes it possible for the repeater to 
assemble a custom resource, using a combination of repeatable and local resources. In 
this way, the static text can be made repeatable, and only the special directive need be 
served locally by the reflector. " 

For example, a resource X might consist oif custom directives selecting an 
advertising banner, followed by a large text article. To make this resource repeatable, the 
Web site administrator must break out a second resource, Y, to select the banner. 
Resource X is modified to contain a repeater-side include directive identifying resource 
Y, along with the article. Resource Y is created "and contaiiis only the custom directives 
selecting an ad banner. Now resource X is repeatable, and only resource Y, which is 
relatively small, is not repeatable. 

When a repeater constructs a reply, it determines whether the resource being 
served is an HTML resource, and if so, scans it for repeater-side include directives. 
Each such directive includes a URL, which the repeater resolves and substitutes in place 
of the directive. The entire resource must be assembled before it is served, in order to 
determine its final size, as the size is included in a reply header ahead of the resource. 
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Thus, a method and apparatus for dynamically replicating selected resources in 
computer networks is provided. One skilled in the art will appreciate that the present 
invention can be practiced by other dian the described embodiments, which are 
presented for purposes of illustration and not limitation, and the present invention is 
limited only by the claims that follow, . 
What is claimed: . . 

1. , . A rnethod of prpce^ssing resource requests in a computer network, the 
method comprising, . , . . . , 

(i) by a client: 

(A) , making a.request for a particular resource from an origin server, 

the request including a. resource identifier for the particular 
resource; 
^li) by a reflector: 

(B) intercepting the request from the client to the origin server; 

(C) selecting a repeater to process the request; 

(D) prOAnding to the client a modified resource identifier designating 
the repeater; 

(iii) by the client: ^ 

(E) receiving the modified resource identifier from the reflector; and 

(F) making a request for the particular resource from the repeater 
designated in the modified resource identifier; 

(iv) by the repeaten 

(G) receiving the request from the client; and 

(H) returning the requested resource to the client. 2. A method 
as in claim 1 further comprising, by the repeater: . 

(I) making a request for the resource from the origin server; and 
. (]) receiving the resource from the origin server. 
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. 3. A method as in claim 1 wherein the selecting of a repeater by the 
reflector comprises: 

(CI) partitioning the network into groups; 

(G2) .. determining which group the-dient isjny : 

■ - . (C3) selecting, from a plurality- of repeaters in the network, a set of repeaters 

- , having.a lowest cost relatiyevtOythe group.which the client is in; and 

• (C4) . selecting as die repeater a memfeer of die selected set of repeaters. 

. , .4. A method as in claim.3, wherein the costof a repeater is a value based on 
that repeater's current loadjandsa maixiipumrlda^ for that repeater. 



on 



• 5. . A method as in claim 3, wherein the cost of a repeater is a value based 
a predicted cost or speed of transmission between the repeater arid a client in the group. 

6. A method as in claim 1 wherein .the particular resource itself contains at 
least one other resource identifier of at least one other resource, the method further 
comprising: , ; . 

rewriting the particular resource to replace at least some of the resource 
identifiers contained therein with modified resource identifiers designating a repeater 
instead of the origin server. . , 

7. A method as in claim ,6 wherein the rewriting is performed by one of the 
repeater, the reflector or another repeater. , 

8. A method of processmg resource requests in a computer network, the 
method comprising, 

(i) by a client: 
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(A) making a request for a particular resource from an origin server, . 
the request including a resource identifier for the particular 
resource; 

(ii) by a reflector: ^- • ; 

(B) intercepdrig'the request from the client to the origin server; 
i 1 : (Q ' ' determining whettiei; to reflect the ^request to a- repeater; 

• * ' (D) ' -whert'the reflector determines not to reflect the request, 
) ' , -forwarding the' request to the oiigin server, otherwise 

(Dl) selecting a repeater to process the request; 
' - .; . v t. t : t^^(D2) ''providing to the client a modified resource idendfier 
■ : . ; "I- %*''cle^ignating4Jie:repfeate^:.' '.".}/-:::.- . ^ ■ 

9. A method as in claim 8, further comprising, when the reflector 
determines to reflect the request, 

(iii) by the client: 

(Ey receiving the modified resource identifier from the reflector; and 
. • * ' - (F) ' ' making a request for the particular resource from the repeater 
designated in the modified resource identifier; 

(iv) by the repeater: . . - • . 

. - : . (G) * receiving the request from the client; and 

(H) returning the requested resource to the client. 

10. A method as in claim 8 wherein the reflector determines whether to 
reflect a request by comparing the resource identifier with regular expression patterns of 
repeatable resources. 
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11. A method as in ckim 8, wherein the reflector has a threshold aggregate 
information rate (TAIR) associated therewith, and wherein the determining of whether 
to reflect the request to a repeater comprises: 

determining whether the TAIR of die reflector is exceeded by a measured 
aggregate information rate (MAIR) for the reflector, wherem the reflector determines 
not tc reflect the request.when the MAIR exceeds^ die TAIR for the reflector. 

12. A mediod as in claim 8^ therein the reflector has a threshold aggregate 
information rate (TAIR) associated therewith, and wherein the^ detenriining of whether 
to reflect the request to a repeater comprises: ; r-s , a n , 

probabilistically determining whether the TAIR of the reflector is exceeded by a 
measured aggregate information rate (MAIR) for the reflector, wherein the reflector 
determines not to reflect the request as an exponential function of the difference 
between the MAIR and the TAIR. 

13. A mediod as in any of claims 11-12, wherein the MAIR is obtained from 
repeaters according to the rate at which they have- transmitted data on behalf of the 
reflector during a given time interval. ■ : ■ ' . : 

1 4. A mediod as in any one of claims 1-12 wherein the network is die 
Internet and wherein the resource identifier is a uniform resource locator (URL) for 
designating resources on die Internet, and wherein die modified resource identifier is a 
URL designating die repeater and indicating the reflector or origin server, and wherein 
die modified resource identifier is provided to the client using a REDIRECT message. 

15. In a computer network wherein clients request resources from origin 
servers, a method comprising: 

providing at least one repeater; 
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providing reflectors at some of the origin servers, each reflector intercepting 
' client r^esource requests made to its respective origin server; and 

each reflector selectively redirecting client resource requests for certain resources 
to one of the repeaters. . , . 

16. rA method as-anjclaim.lSff^rtheii comprising; by repeaters in the network: 
servicing redirected client resource requests; and 

selecdy ely, maint^ning copies: of requested resourctsy ■ i 
1 :fWhereby;resoprce5. corresponding to redirected resource tequests are selectively 
migrated from their origin servers to one.or more repeaters; , ' 

17. A computer- network comprising:- -.^p i ;<zr-:j:-r i. j ' : 

a plurality, of origin servers, at least some of the origin servers having reflectors 
associated therewith; 

a plurality of repeaters; and 
a plurality of clients, 

wherein,.each r.efle<?tor is- adapted to intercept resource requests made to its 
respective origin server and to selectively redirect the resource requests to a dynamically 
selected repeater. 

. 18. . In a computer network wherein clients request resources from origin 
servers, a reflector mechanism associated with an origin server, the reflector mechanism 
comprising: . * 

. ■ . • means for intercepting, a resource request made by client of an' origin server; 

means for analyzing the resource request to determine whether to service the 
request locally at the origin. server; . v 

means for determining a best repeater in the network to service the request when 
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the analyzing means determines that the request should not be serviced locally; and 
means for redirecdng die client to the best repeater. 

, 19,. A reflector mechanism as in claim 18 wherein the network is partitioned 
into groups and the means for determining the best repeater comprises: r > - ■ 
. .- ipe".ans for determining which group the ebent is in; : - . 
^-■rneans.for sdeGting,.from aplurg^^ 
repeaters having a lowest cost relative to the group the client.is in;- and' 

means for selecting as die best repeater a member of the set of repeaters. 

20. A reflector mechanism as in claim 1 9, wherein the cost of a repeater is a 
value based on a predicted cost or speed of transmission between the repeater and a 
client in the group. 

21 . A mechanism as in claim 1 9, wherein the cost of a repeater is a value 
based on that repeaters current load and a maximum load for that repeater. 

22. A reflector as in claim 16 wherein the resource itself contains resource 
identifiers, the reflector further comprising: 

means for rewriting die resource to replace at least some of the resource 
identifiers contained dierein with modified resource identifiers designating the repeater 

instead of the origin server. 

23. In a computer network wherein clients request resources from origin 
servers, a repeater mechanism comprising: 

means for receiving a resource request from a cUent; 

means for determining whether the resource is available locaUy; 

means for. when it is determined that the resource is not available locaOy, 
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obtaining the resource from an origin server; and 

means for providing die^ resource to the client. 

24. A reflector as.in claim 18 wherein the resource itself contains resource 
identifiers, the repeater furdiercompri^ing-j ' • . * . 

means for rewriting the resource^ to 'replace^ at least some of the resource 
identifiers containcdi therein^ with rn(bdified resource idicnufiers designating the repeater 
instead of the brigin:s'erver. • ' ^ ' \ i = i.i i , ■ . ; ' - •/ 
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